AITopics | secret prompt

Collaborating Authors

secret prompt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

094324f386c836c75d4a26f3499d2ede-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 15:16:53 GMT

instruction, safety example, secret prompt, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

Neural Information Processing SystemsOct-9-2025, 18:01:14 GMT

Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger".

backdoor enhanced safety alignment, instruction, secret prompt, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning

Bouaziz, Wassim, Videau, Mathurin, Usunier, Nicolas, El-Mhamdi, El-Mahdi

arXiv.org Machine LearningJun-19-2025

The pre-training of large language models (LLMs) relies on massive text datasets sourced from diverse and difficult-to-curate origins. Although membership inference attacks and hidden canaries have been explored to trace data usage, such methods rely on memorization of training data, which LM providers try to limit. In this work, we demonstrate that indirect data poisoning (where the targeted behavior is absent from training data) is not only feasible but also allow to effectively protect a dataset and trace its use. Using gradient-based optimization prompt-tuning, we make a model learn arbitrary secret sequences: secret responses to secret prompts that are absent from the training corpus. We validate our approach on language models pre-trained from scratch and show that less than 0.005% of poisoned tokens are sufficient to covertly make a LM learn a secret and detect it with extremely high confidence ($p < 10^{-55}$) with a theoretically certifiable scheme. Crucially, this occurs without performance degradation (on LM benchmarks) and despite secrets never appearing in the training set.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2506.14913

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)
Africa > North Africa (0.04)
Africa > Middle East > Algeria (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Army (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

Wang, Jiongxiao, Li, Jiazhao, Li, Yiquan, Qi, Xiangyu, Hu, Junjie, Li, Yixuan, McDaniel, Patrick, Chen, Muhao, Li, Bo, Xiao, Chaowei

arXiv.org Artificial IntelligenceJun-20-2024

Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been significantly compromised by fine-tuning users' uploaded examples contain just a few harmful examples. Though potential defenses have been proposed that the service providers can integrate safety examples into the fine-tuning dataset to reduce safety issues, such approaches require incorporating a substantial amount of data, making it inefficient. To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger". By integrating prefixed safety examples into the fine-tuning dataset, the subsequent fine-tuning process effectively acts as the "backdoor attack", establishing a strong correlation between the secret prompt and safety generations. Consequently, safe responses are ensured once service providers prepend this secret prompt ahead of any user input during inference. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models without harming the benign performance. Furthermore, we also present the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data.

backdoor enhanced safety alignment, instruction, secret prompt, (11 more...)

arXiv.org Artificial Intelligence

2402.14968

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Effective Prompt Extraction from Language Models

Zhang, Yiming, Carlini, Nicholas, Ippolito, Daphne

arXiv.org Artificial IntelligenceFeb-17-2024

The text generated by large language models is commonly controlled by prompting, where a prompt prepended to a user's query guides the model's output. The prompts used by companies to guide their models are often treated as secrets, to be hidden from the user making the query. They have even been treated as commodities to be bought and sold. However, anecdotal reports have shown adversarial users employing prompt extraction attacks to recover these prompts. In this paper, we present a framework for systematically measuring the effectiveness of these attacks. In experiments with 3 different sources of prompts and 11 underlying large language models, we find that simple text-based attacks can in fact reveal prompts with high probability. Our framework determines with high precision whether an extracted prompt is the actual secret prompt, rather than a model hallucination. Prompt extraction experiments on real systems such as Bing Chat and ChatGPT suggest that system prompts can be revealed by an adversary despite existing defenses in place.

extraction, instruction, language model, (15 more...)

arXiv.org Artificial Intelligence

2307.06865

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GPT-4 CODEX: Coding Design Expert; A Secret Prompt To Rule Them All

#artificialintelligenceApr-8-2023, 18:15:56 GMT

You are reading this article because you may have tried asking GPT-4 to code for you and constantly strive to explore efficient prompts. Now that we are on the same page let's discover the shortcomings of GPT-4 outputting the code and how to overcome it. Undoubtedly, GPT-4 capabilities are mind-blowing, but it is annoying to see how default GPT-4 output without role prompting is inefficient enough to output the code. If you are struggling with the same issue I was, you are in luck finding this article, my friend. As an avid user of ChatGPT, spending around 5–6 hours daily for the last 5 months (1000 hrs), I have uncovered some secret prompts over time that I am confident to make your coding experience with GPT a lot better.

codex, coding design expert, secret prompt, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback